26 research outputs found

    Analyzing the effect of local rounding error propagation on the maximal attainable accuracy of the pipelined Conjugate Gradient method

    Get PDF
    Pipelined Krylov subspace methods typically offer improved strong scaling on parallel HPC hardware compared to standard Krylov subspace methods for large and sparse linear systems. In pipelined methods the traditional synchronization bottleneck is mitigated by overlapping time-consuming global communications with useful computations. However, to achieve this communication hiding strategy, pipelined methods introduce additional recurrence relations for a number of auxiliary variables that are required to update the approximate solution. This paper aims at studying the influence of local rounding errors that are introduced by the additional recurrences in the pipelined Conjugate Gradient method. Specifically, we analyze the impact of local round-off effects on the attainable accuracy of the pipelined CG algorithm and compare to the traditional CG method. Furthermore, we estimate the gap between the true residual and the recursively computed residual used in the algorithm. Based on this estimate we suggest an automated residual replacement strategy to reduce the loss of attainable accuracy on the final iterative solution. The resulting pipelined CG method with residual replacement improves the maximal attainable accuracy of pipelined CG, while maintaining the efficient parallel performance of the pipelined method. This conclusion is substantiated by numerical results for a variety of benchmark problems.Comment: 26 pages, 6 figures, 2 tables, 4 algorithm

    On rounding error resilience, maximal attainable accuracy and parallel performance of the pipelined Conjugate Gradients method for large-scale linear systems in PETSc

    Get PDF
    International audiencePipelined Krylov solvers typically display better strong scaling compared to standard Krylov methods for large linear systems. The synchronization bottleneck is mitigated by overlapping time-consuming global communications with computations. To achieve this hiding of communication, pipelined methods feature additional recurrence relations on auxiliary variables. This paper analyzes why rounding error effects have a significantly larger impact on the accuracy of pipelined algorithms. An algebraic model for the accumulation of rounding errors in the (pipelined) CG algorithm is derived. Furthermore, an automated residual replacement strategy is proposed to reduce the effect of rounding errors on the final solution. MPI parallel performance tests implemented in PETSc on an Intel Xeon X5660 cluster show that the pipelined CG method with automated residual replacement is more resilient to rounding errors while maintaining the efficient parallel performance obtained by pipelining

    On soft errors in the Conjugate Gradient method: sensitivity and robust numerical detection: Sur les soft-erreurs dans la méthode du Gradient Conjugué: sensibilité et détection numérique robuste

    Get PDF
    The conjugate gradient (CG) method is the most widely used iterative scheme forthe solution of large sparse systems of linear equations when the matrix is symmetric positivedefinite. Although more than sixty year old, it is still a serious candidate for extreme-scalecomputation on large computing platforms. On the technological side, the continuous shrinkingof transistor geometry and the increasing complexity of these devices affect dramatically theirsensitivity to natural radiation, and thus diminish their reliability. One of the most common effectsproduced by natural radiation is the single event upset which consists in a bit-flip in a memory cellproducing unexpected results at application level. Consequently, the future computing facilitiesat extreme scale might be more prone to errors of any kind including bit-flip during calculation.These numerical and technological observations are the main motivations for this work, where wefirst investigate through extensive numerical experiments the sensitivity of CG to bit-flips in itsmain computationally intensive kernels, namely the matrix-vector product and the preconditionerapplication. We further propose numerical criteria to detect the occurrence of such faults; we assesstheir robustness through extensive numerical experiments.La méthode du gradient conjugue (CG) est la méthode itérative la plus utiliséespour résoudre des ssytèmes linéaires creux de grande taille lorsque la matrice est symétriquedéfinie positive. Bien que vieille de de soixante ans, cette méthode reste une candidate sérieusepour être mise en œuvre pour la résolution de très grands systèmes linéaires sur des plateformesde calcul de très grande taille. Sur le plan technologique, la réduction permanente de la taille et lacomplexité croissante des composantes électroniques de ces calculateurs affecte dramatiquementleur sensibilité aux radiations cosmiques ce qui réduit leur fiabilité. L’un des effets les pluscourants des rayonnements naturels est la perturbation due à un événement unique qui consisteen un retournement de bit dans une cellule mémoire produisant des résultats inattendus auniveau de l’application. Par conséquent, les futures installations informatiques à très grandeéchelle pourraient être plus sujettes à des erreurs de toute sorte. y compris le basculement de bitpendant le calcul. Ces observations numériques et technologiques sont les suivantes les principalesmotivations de ce travail, pour lequel nous étudions d’abord par le biais d’études approfondies etapprofondies la sensibilité de la CG aux sauts de bits dans ses principaux domaines d’application.à forte intensité de calcul, à savoir le produit matrice-vecteur et le produit application dupréconditionneur. Nous proposons en outre des critères numériques pour détecter l’apparition detels défauts ; nous évaluons leur robustesse à travers des expériences numériques approfondie

    Analysis of rounding error accumulation in Conjugate Gradients to improve the maximal attainable accuracy of pipelined CG

    Get PDF
    Pipelined Krylov solvers typically offer better scalability in the strong scaling limit compared to standard Krylov methods. The synchronization bottleneck is mitigated by overlapping time-consuming global communications with useful computations in the algorithm. However, to achieve this communication hiding strategy, pipelined methods feature multiple recurrence relations on additional auxiliary variables to update the guess for the solution. This paper aims at studying the influence of rounding errors on the convergence of the pipelined Conjugate Gradient method. It is analyzed why rounding effects have a significantly larger impact on the maximal attainable accuracy of the pipelined CG algorithm compared to the traditional CG method. Furthermore, an algebraic model for the accumulation of rounding errors throughout the (pipelined) CG algorithm is derived. Based on this rounding error model, we then propose an automated residual replacement strategy to reduce the effect of rounding errors on the final iterative solution. The resulting pipelined CG method with automated residual replacement improves the maximal attainable accuracy of pipelined CG to a precision comparable to that of standard CG, while maintaining the efficient parallel performance of the pipelined method

    On soft errors in the conjugate gradient method: sensitivity and robust numerical detection

    Get PDF
    International audienceThe conjugate gradient (CG) method is the most widely used iterative scheme for the solution of large sparse systems of linear equations when the matrix is symmetric positive definite. Although more than 60 years old, it is still a serious candidate for extreme-scale computations on large computing platforms. On the technological side, the continuous shrinking of transistor geometry and the increasing complexity of these devices affect dramatically their sensitivity to natural radiation and thus diminish their reliability. One of the most common effects produced by natural radiation is the single event upset which consists in a bit-flip in a memory cell producing unexpected results at the application level. Consequently, future extreme-scale computing facilities will be more prone to errors of any kind, including bit-flips, during their calculations. These numerical and technological observations are the main motivations for this work, where we first investigate through extensive numerical experiments the sensitivity of CG to bit-flips in its main computationally intensive kernels, namely the matrix-vector product and the preconditioner application. We further propose numerical criteria to detect the occurrence of such soft errors and assess their robustness through extensive numerical experiments

    A complementary note on soft errors in the Conjugate Gradient method: the persistent error case

    Get PDF
    This note is a follow up study to [1], where we studied the resilience of the preconditioned conjugate gradient method (PCG). We complement the original work by performinga similar series of numerical experiments, but using what we called persistent instead of transient bit-flips.Cette note est une étude qui fait suite à [1], où nous avons étudié la résilience de la méthode du gradient conjugué préconditionné (PCG). Nous complétons le travail initial en effectuant une série similaire d’expériences numériques, mais en utilisant ce que nous avons appelé des bit-flips persistants au lieu de transitoires

    On soft errors in the Conjugate Gradient method: sensitivity and robust numerical detection -revised

    Get PDF
    The conjugate gradient (CG) method is the most widely used iterative scheme forthe solution of large sparse systems of linear equations when the matrix is symmetric positivedefinite. Although more than sixty year old, it is still a serious candidate for extreme-scalecomputation on large computing platforms. On the technological side, the continuous shrinkingof transistor geometry and the increasing complexity of these devices affect dramatically theirsensitivity to natural radiation, and thus diminish their reliability. One of the most common effectsproduced by natural radiation is the single event upset which consists in a bit-flip in a memory cellproducing unexpected results at application level. Consequently, the future computing facilitiesat extreme scale might be more prone to errors of any kind including bit-flip during calculation.These numerical and technological observations are the main motivations for this work, where wefirst investigate through extensive numerical experiments the sensitivity of CG to bit-flips in itsmain computationally intensive kernels, namely the matrix-vector product and the preconditionerapplication. We further propose numerical criteria to detect the occurrence of such soft errors; weassess their robustness through extensive numerical experiments.La méthode du gradient conjugue (CG) est la méthode itérative la plus utilisée pour résoudre des systèmes linéaires creux de grande taille lorsque la matrice est symétrique définie positive. Bien que vieille de de soixante ans, cette méthode reste une candidate sérieuse pour être mise en œuvre pour la résolution de très grands systèmes linéaires sur des plateformes de calcul de très grande taille. Sur le plan technologique, la réduction permanente de la taille et la complexité croissante des composantes électroniques de ces calculateurs affecte dramatiquement leur sensibilité aux radiations cosmiques ce qui réduit leur fiabilité. L’un des effets les plus courants des rayonnements naturels est la perturbation due à un événement unique qui consiste en un retournement de bit dans une cellule mémoire produisant des résultats inattendus au niveau de l’application. Par conséquent, les futures installations informatiques à très grande échelle pourraient être plus sujettes à des erreurs de toute sorte. y compris le basculement de bit pendant le calcul. Ces observations numériques et technologiques sont les suivantes les principales motivations de ce travail, pour lequel nous étudions d’abord par le biais d’études approfondies et approfondies la sensibilité de la CG aux sauts de bits dans ses principaux domaines d’application.à forte intensité de calcul, à savoir le produit matrice-vecteur et le produit application du préconditionneur. Nous proposons en outre des critères numériques pour détecter l’apparition de tels défauts ; nous évaluons leur robustesse à travers des expériences numériques approfondies

    Reliability of Checksum based Detection for Soft Errors in Conjugate Gradient Variants

    No full text
    International audienceSoft errors that are not detected by hardware mechanisms may be extremely complex to detect at the software layer. One option is to perform a full duplication of the computation (and data) and check on a regular basis that intermediate results are consistent. However, this mechanism may be prohibitive. In the context of CG solver, the most prohibitive operation to duplicate is SpMV. To avoid the duplication of this operation, checksum mechanisms may be employed. In this presentation, we investigate the reliability of such an approach in finite precision arithmetic. We illustrate our discussion with the CGPOP code, a miniapp for performing the CG within the Parallel Ocean Program (POP), which is a candidate for exascale climate simulations

    On Resiliency in Krylov Solvers

    No full text
    International audienceIn this talk we will discuss possible numerical remedies to survive data loss in some numerical linear algebra solvers namely Krylov subspace linear solvers and some widely used eigensolvers. Assuming that a separate mechanism ensures fault detection, we propose numerical algorithms to extract relevant information from available data after a fault. After data extraction, well chosen part of missing data is regenerated through interpolation strategies to constitute meaningful inputs to numerically restart. We will also present some preliminary investigations to address soft error detection again at the application level in the conjugate gradient framework
    corecore